home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Disc to the Future 2
/
Disc to the Future Part II Programmer's Reference (Wayzata Technology)(6013)(1992).bin
/
MAC
/
THINKC
/
4_0
/
FLEX-TC_
/
FLEX.MAN
< prev
next >
Wrap
Text File
|
1990-02-10
|
26KB
|
727 lines
FLEX(1) UNIX Programmer's Manual FLEX(1)
NAME
flex - fast lexical analyzer generator
SYNOPSIS
flex [ -bdfipstvFILT -c[efmF] -Sskeleton_file ] [ _f_i_l_e_n_a_m_e ]
DESCRIPTION
_f_l_e_x is a rewrite of _l_e_x intended to right some of that
tool's deficiencies: in particular, _f_l_e_x generates lexical
analyzers much faster, and the analyzers use smaller tables
and run faster.
OPTIONS
In addition to lex's -t flag, flex has the following
options:
-b Generate backtracking information to _l_e_x._b_a_c_k_t_r_a_c_k.
This is a list of scanner states which require back-
tracking and the input characters on which they do so.
By adding rules one can remove backtracking states. If
all backtracking states are eliminated and -f or -F is
used, the generated scanner will run faster (see the -p
flag). Only users who wish to squeeze every last cycle
out of their scanners need worry about this option.
-d makes the generated scanner run in _d_e_b_u_g mode. When-
ever a pattern is recognized the scanner will write to
_s_t_d_e_r_r a line of the form:
--accepting rule #n
Rules are numbered sequentially with the first one
being 1. Rule #0 is executed when the scanner back-
tracks; Rule #(n+1) (where _n is the number of rules)
indicates the default action; Rule #(n+2) indicates
that the input buffer is empty and needs to be refilled
and then the scan restarted. Rules beyond (n+2) are
end-of-file actions.
-f has the same effect as lex's -f flag (do not compress
the scanner tables); the mnemonic changes from _f_a_s_t
_c_o_m_p_i_l_a_t_i_o_n to (take your pick) _f_u_l_l _t_a_b_l_e or _f_a_s_t
_s_c_a_n_n_e_r. The actual compilation takes _l_o_n_g_e_r, since
flex is I/O bound writing out the big table.
This option is equivalent to -cf (see below).
-i instructs flex to generate a _c_a_s_e-_i_n_s_e_n_s_i_t_i_v_e scanner.
The case of letters given in the flex input patterns
will be ignored, and the rules will be matched regard-
less of case. The matched text given in _y_y_t_e_x_t will
have the preserved case (i.e., it will not be folded).
Printed 2/10/90 20 June 1989 1
FLEX(1) UNIX Programmer's Manual FLEX(1)
-p generates a performance report to stderr. The report
consists of comments regarding features of the flex
input file which will cause a loss of performance in
the resulting scanner. Note that the use of _R_E_J_E_C_T and
variable trailing context (see BUGS) entails a substan-
tial performance penalty; use of _y_y_m_o_r_e(), the ^ opera-
tor, and the -I flag entail minor performance penal-
ties.
-s causes the _d_e_f_a_u_l_t _r_u_l_e (that unmatched scanner input
is echoed to _s_t_d_o_u_t) to be suppressed. If the scanner
encounters input that does not match any of its rules,
it aborts with an error. This option is useful for
finding holes in a scanner's rule set.
-v has the same meaning as for lex (print to _s_t_d_e_r_r a sum-
mary of statistics of the generated scanner). Many
more statistics are printed, though, and the summary
spans several lines. Most of the statistics are mean-
ingless to the casual flex user, but the first line
identifies the version of flex, which is useful for
figuring out where you stand with respect to patches
and new releases.
-F specifies that the _f_a_s_t scanner table representation
should be used. This representation is about as fast
as the full table representation (-_f), and for some
sets of patterns will be considerably smaller (and for
others, larger). In general, if the pattern set con-
tains both "keywords" and a catch-all, "identifier"
rule, such as in the set:
"case" return ( TOK_CASE );
"switch" return ( TOK_SWITCH );
...
"default" return ( TOK_DEFAULT );
[a-z]+ return ( TOK_ID );
then you're better off using the full table representa-
tion. If only the "identifier" rule is present and you
then use a hash table or some such to detect the key-
words, you're better off using -_F.
This option is equivalent to -cF (see below).
-I instructs flex to generate an _i_n_t_e_r_a_c_t_i_v_e scanner.
Normally, scanners generated by flex always look ahead
one character before deciding that a rule has been
matched. At the cost of some scanning overhead, flex
will generate a scanner which only looks ahead when
needed. Such scanners are called _i_n_t_e_r_a_c_t_i_v_e because
if you want to write a scanner for an interactive
Printed 2/10/90 20 June 1989 2
FLEX(1) UNIX Programmer's Manual FLEX(1)
system such as a command shell, you will probably want
the user's input to be terminated with a newline, and
without -I the user will have to type a character in
addition to the newline in order to have the newline
recognized. This leads to dreadful interactive perfor-
mance.
If all this seems to confusing, here's the general
rule: if a human will be typing in input to your
scanner, use -I, otherwise don't; if you don't care
about how fast your scanners run and don't want to make
any assumptions about the input to your scanner, always
use -I.
Note, -I cannot be used in conjunction with _f_u_l_l or
_f_a_s_t _t_a_b_l_e_s, i.e., the -f, -F, -cf, or -cF flags.
-L instructs flex to not generate #line directives (see
below).
-T makes flex run in _t_r_a_c_e mode. It will generate a lot
of messages to stdout concerning the form of the input
and the resultant non-deterministic and deterministic
finite automatons. This option is mostly for use in
maintaining flex.
-c[efmF]
controls the degree of table compression. -ce directs
flex to construct _e_q_u_i_v_a_l_e_n_c_e _c_l_a_s_s_e_s, i.e., sets of
characters which have identical lexical properties (for
example, if the only appearance of digits in the flex
input is in the character class "[0-9]" then the digits
'0', '1', ..., '9' will all be put in the same
equivalence class). -cf specifies that the _f_u_l_l
scanner tables should be generated - flex should not
compress the tables by taking advantages of similar
transition functions for different states. -cF speci-
fies that the alternate fast scanner representation
(described above under the -F flag) should be used. -
cm directs flex to construct _m_e_t_a-_e_q_u_i_v_a_l_e_n_c_e _c_l_a_s_s_e_s,
which are sets of equivalence classes (or characters,
if equivalence classes are not being used) that are
commonly used together. A lone -c specifies that the
scanner tables should be compressed but neither
equivalence classes nor meta-equivalence classes should
be used.
The options -cf or -cF and -cm do not make sense
together - there is no opportunity for meta-equivalence
classes if the table is not being compressed. Other-
wise the options may be freely mixed.
Printed 2/10/90 20 June 1989 3
FLEX(1) UNIX Programmer's Manual FLEX(1)
The default setting is -cem which specifies that flex
should generate equivalence classes and meta-
equivalence classes. This setting provides the highest
degree of table compression. You can trade off
faster-executing scanners at the cost of larger tables
with the following generally being true:
slowest smallest
-cem
-ce
-cm
-c
-c{f,F}e
-c{f,F}
fastest largest
Note that scanners with the smallest tables compile the
quickest, so during development you will usually want
to use the default, maximal compression.
-Sskeleton_file
overrides the default skeleton file from which flex
constructs its scanners. You'll never need this option
unless you are doing flex maintenance or development.
INCOMPATIBILITIES WITH LEX
_f_l_e_x is fully compatible with _l_e_x with the following excep-
tions:
- There is no run-time library to link with. You needn't
specify -_l_l when linking, and you must supply a main
program. (Hacker's note: since the lex library con-
tains a main() which simply calls yylex(), you actually
_c_a_n be lazy and not supply your own main program and
link with -_l_l.)
- lex's %r (Ratfor scanners) and %t (translation table)
options are not supported.
- The do-nothing -_n flag is not supported.
- When definitions are expanded, flex encloses them in
parentheses. With lex, the following
NAME [A-Z][A-Z0-9]*
%%
foo{NAME}? printf( "Found it\n" );
%%
will not match the string "foo" because when the macro
is expanded the rule is equivalent to "foo[A-Z][A-Z0-
9]*?" and the precedence is such that the '?' is
Printed 2/10/90 20 June 1989 4
FLEX(1) UNIX Programmer's Manual FLEX(1)
associated with "[A-Z0-9]*". With flex, the rule will
be expanded to "foo([A-z][A-Z0-9]*)?" and so the string
"foo" will match. Note that because of this, the ^, $,
<s>, and / operators cannot be used in a definition.
- The undocumented lex-scanner internal variable yylineno
is not supported.
- The input() routine is not redefinable, though may be
called to read characters following whatever has been
matched by a rule. If input() encounters an end-of-
file the normal yywrap() processing is done. A
``real'' end-of-file is returned as _E_O_F.
Input can be controlled by redefining the YY_INPUT
macro. YY_INPUT's calling sequence is
"YY_INPUT(buf,result,max_size)". Its action is to
place up to max_size characters in the character buffer
"buf" and return in the integer variable "result"
either the number of characters read or the constant
YY_NULL (0 on Unix systems) systems) to indicate EOF.
The default YY_INPUT reads from the file-pointer "yyin"
(which is by default _s_t_d_i_n), so if you just want to
change the input file, you needn't redefine YY_INPUT -
just point yyin at the input file.
A sample redefinition of YY_INPUT (in the first section
of the input file):
%{
#undef YY_INPUT
#define YY_INPUT(buf,result,max_size) \
result = (buf[0] = getchar()) == EOF ? YY_NULL : 1;
%}
You also can add in things like counting keeping track
of the input line number this way; but don't expect
your scanner to go very fast.
- output() is not supported. Output from the ECHO macro
is done to the file-pointer "yyout" (default _s_t_d_o_u_t).
- If you are providing your own yywrap() routine, you
must "#undef yywrap" first.
- To refer to yytext outside of your scanner source file,
use "extern char *yytext;" rather than "extern char
yytext[];".
- yyleng is a macro and not a variable, and hence cannot
be accessed outside of the scanner source file.
Printed 2/10/90 20 June 1989 5
FLEX(1) UNIX Programmer's Manual FLEX(1)
- flex reads only one input file, while lex's input is
made up of the concatenation of its input files.
- The name FLEX_SCANNER is #define'd so scanners may be
written for use with either flex or lex.
- The macro YY_USER_ACTION can be redefined to provide an
action which is always executed prior to the matched
rule's action. For example, it could be #define'd to
call a routine to convert yytext to lower-case, or to
copy yyleng to a global variable to make it accessible
outside of the scanner source file.
- In the generated scanner, rules are separated using
YY_BREAK instead of simple "break"'s. This allows, for
example, C++ users to #define YY_BREAK to do nothing
(while being very careful that every rule ends with a
"break" or a "return"!) to avoid suffering from
unreachable statement warnings where a rule's action
ends with "return".
ENHANCEMENTS
- _E_x_c_l_u_s_i_v_e _s_t_a_r_t-_c_o_n_d_i_t_i_o_n_s can be declared by using %x
instead of %s. These start-conditions have the property
that when they are active, _n_o _o_t_h_e_r _r_u_l_e_s _a_r_e _a_c_t_i_v_e.
Thus a set of rules governed by the same exclusive
start condition describe a scanner which is independent
of any of the other rules in the flex input. This
feature makes it easy to specify "mini-scanners" which
scan portions of the input that are syntactically dif-
ferent from the rest (e.g., comments).
- _y_y_t_e_r_m_i_n_a_t_e() can be used in lieu of a return statement
in an action. It terminates the scanner and returns a
0 to the scanner's caller, indicating "all done".
- _E_n_d-_o_f-_f_i_l_e _r_u_l_e_s. The special rule "<<EOF>>" indicates
actions which are to be taken when an end-of-file is
encountered and yywrap() returns non-zero (i.e., indi-
cates no further files to process). The action can
either point yyin at a new file to process, in which
case the action should finish with _Y_Y__N_E_W__F_I_L_E (this is
a branch, so subsequent code in the action won't be
executed), or it should finish with a _r_e_t_u_r_n statement.
<<EOF>> rules may not be used with other patterns; they
may only be qualified with a list of start conditions.
If an unqualified <<EOF>> rule is given, it applies
only to the INITIAL start condition, and _n_o_t to %s
start conditions. These rules are useful for catching
things like unclosed comments. An example:
%x quote
Printed 2/10/90 20 June 1989 6
FLEX(1) UNIX Programmer's Manual FLEX(1)
%%
...
<quote><<EOF>> {
error( "unterminated quote" );
yyterminate();
}
<<EOF>> {
yyin = fopen( next_file, "r" );
YY_NEW_FILE;
}
- flex dynamically resizes its internal tables, so direc-
tives like "%a 3000" are not needed when specifying
large scanners.
- The scanning routine generated by flex is declared
using the macro YY_DECL. By redefining this macro you
can change the routine's name and its calling sequence.
For example, you could use:
#undef YY_DECL
#define YY_DECL float lexscan( a, b ) float a, b;
to give it the name _l_e_x_s_c_a_n, returning a float, and
taking two floats as arguments. Note that if you give
arguments to the scanning routine, you must terminate
the definition with a semi-colon (;).
- flex generates #line directives mapping lines in the
output to their origin in the input file.
- You can put multiple actions on the same line,
separated with semi-colons. With lex, the following
foo handle_foo(); return 1;
is truncated to
foo handle_foo();
flex does not truncate the action. Actions that are
not enclosed in braces are terminated at the end of the
line.
- Actions can be begun with %{ and terminated with %}. In
this case, flex does not count braces to figure out
where the action ends - actions are terminated by the
closing %}. This feature is useful when the enclosed
action has extraneous braces in it (usually in comments
or inside inactive #ifdef's) that throw off the brace-
count.
Printed 2/10/90 20 June 1989 7
FLEX(1) UNIX Programmer's Manual FLEX(1)
- All of the scanner actions (e.g., ECHO, yywrap ...)
except the unput() and input() routines, are written as
macros, so they can be redefined if necessary without
requiring a separate library to link to.
- When yywrap() indicates that the scanner is done pro-
cessing (it does this by returning non-zero), on subse-
quent calls the scanner will always immediately return
a value of 0. To restart it on a new input file, the
action yyrestart() is used. It takes one argument, the
new input file. It closes the previous yyin (unless
stdin) and sets up the scanners internal variables so
that the next call to yylex() will start scanning the
new file. This functionality is useful for, e.g., pro-
grams which will process a file, do some work, and then
get a message to parse another file.
- Flex scans the code in section 1 (inside %{}'s) and the
actions for occurrences of _R_E_J_E_C_T and _y_y_m_o_r_e(). If it
doesn't see any, it assumes the features are not used
and generates higher-performance scanners. Flex tries
to be correct in identifying uses but can be fooled
(for example, if a reference is made in a macro from a
#include file). If this happens (a feature is used and
flex didn't realize it) you will get a compile-time
error of the form
reject_used_but_not_detected undefined
You can tell flex that a feature is used even if it
doesn't think so with %used followed by the name of the
feature (for example, "%used REJECT"); similarly, you
can specify that a feature is _n_o_t used even though it
thinks it is with %unused.
- Comments may be put in the first section of the input
by preceding them with '#'.
FILES
_f_l_e_x._s_k_e_l
skeleton scanner
_l_e_x._y_y._c
generated scanner (called _l_e_x_y_y._c on some systems).
_l_e_x._b_a_c_k_t_r_a_c_k
backtracking information for -b flag (called _l_e_x._b_c_k on
some systems).
SEE ALSO
lex(1)
Printed 2/10/90 20 June 1989 8
FLEX(1) UNIX Programmer's Manual FLEX(1)
M. E. Lesk and E. Schmidt, _L_E_X - _L_e_x_i_c_a_l _A_n_a_l_y_z_e_r _G_e_n_e_r_a_t_o_r
AUTHOR
Vern Paxson, with the help of many ideas and much inspira-
tion from Van Jacobson. Original version by Jef Poskanzer.
Fast table representation is a partial implementation of a
design done by Van Jacobson. The implementation was done by
Kevin Gong and Vern Paxson.
Thanks to the many flex beta-testers and feedbackers, espe-
cially Casey Leedom, Frederic Brehm, Nick Christopher, Chris
Faylor, Eric Goldman, Eric Hughes, Greg Lee, Craig Leres,
Mohamed el Lozy, Jim Meyering, Esmond Pitt, Jef Poskanzer,
and Dave Tallman. Thanks to Keith Bostic, John Gilmore, Bob
Mulcahy, Rich Salz, and Richard Stallman for help with vari-
ous distribution headaches.
Send comments to:
Vern Paxson
Real Time Systems
Bldg. 46A
Lawrence Berkeley Laboratory
1 Cyclotron Rd.
Berkeley, CA 94720
(415) 486-6411
vern@csam.lbl.gov
vern@rtsg.ee.lbl.gov
ucbvax!csam.lbl.gov!vern
I will be gone from mid-July '89 through mid-August '89.
From August on, the addresses are:
vern@cs.cornell.edu
Vern Paxson
CS Department
Grad Office
4126 Upson
Cornell University
Ithaca, NY 14853-7501
<no phone number yet>
Email sent to the former addresses should continue to be
forwarded for quite a while. Also, it looks like my user-
name will be "paxson" and not "vern". I'm planning on hav-
ing a mail alias set up so "vern" will still work, but if
you encounter problems try "paxson".
Printed 2/10/90 20 June 1989 9
FLEX(1) UNIX Programmer's Manual FLEX(1)
DIAGNOSTICS
_f_l_e_x _s_c_a_n_n_e_r _j_a_m_m_e_d - a scanner compiled with -s has encoun-
tered an input string which wasn't matched by any of its
rules.
_f_l_e_x _i_n_p_u_t _b_u_f_f_e_r _o_v_e_r_f_l_o_w_e_d - a scanner rule matched a
string long enough to overflow the scanner's internal input
buffer (16K bytes - controlled by YY_BUF_MAX in
"flex.skel").
_o_l_d-_s_t_y_l_e _l_e_x _c_o_m_m_a_n_d _i_g_n_o_r_e_d - the flex input contains a
lex command (e.g., "%n 1000") which is being ignored.
BUGS
Some trailing context patterns cannot be properly matched
and generate warning messages ("Dangerous trailing con-
text"). These are patterns where the ending of the first
part of the rule matches the beginning of the second part,
such as "zx*/xy*", where the 'x*' matches the 'x' at the
beginning of the trailing context. (Lex doesn't get these
patterns right either.) If desperate, you can use yyless()
to effect arbitrary trailing context.
_v_a_r_i_a_b_l_e trailing context (where both the leading and trail-
ing parts do not have a fixed length) entails the same per-
formance loss as _R_E_J_E_C_T (i.e., substantial).
For some trailing context rules, parts which are actually
fixed-length are not recognized as such, leading to the
abovementioned performance loss. In particular, parts using
'|' or {n} are always considered variable-length.
Use of unput() or input() trashes the current yytext and
yyleng.
Use of unput() to push back more text than was matched can
result in the pushed-back text matching a beginning-of-line
('^') rule even though it didn't come at the beginning of
the line.
yytext and yyleng cannot be modified within a flex action.
Nulls are not allowed in flex inputs or in the inputs to
scanners generated by flex. Their presence generates fatal
errors.
Flex does not generate correct #line directives for code
internal to the scanner; thus, bugs in _f_l_e_x._s_k_e_l yield bogus
line numbers.
Pushing back definitions enclosed in ()'s can result in
nasty, difficult-to-understand problems like:
Printed 2/10/90 20 June 1989 10
FLEX(1) UNIX Programmer's Manual FLEX(1)
{DIG} [0-9] /* a digit */
In which the pushed-back text is "([0-9] /* a digit */)".
Due to both buffering of input and read-ahead, you cannot
intermix calls to stdio routines, such as, for example,
getchar() with flex rules and expect it to work. Call
input() instead.
The total table entries listed by the -v flag excludes the
number of table entries needed to determine what rule has
been matched. The number of entries is equal to the number
of DFA states if the scanner does not use REJECT, and some-
what greater than the number of states if it does.
To be consistent with ANSI C, the escape sequence \xhh
should be recognized for hexadecimal escape sequences, such
as '\x41' for 'A'.
It would be useful if flex wrote to lex.yy.c a summary of
the flags used in its generation (such as which table
compression options).
The scanner run-time speeds still have not been optimized as
much as they deserve. Van Jacobson's work shows that the
can go faster still.
The utility needs more complete documentation.
Printed 2/10/90 20 June 1989 11